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Abstract 

Hyperspectral image is a substitution of more than a hundred im- 
ages, called bands, of the same region. They are taken at juxtaposed 
frequencies. The reference image of the region is called Ground Truth 
map (GT). the problematic is how to find the good bands to classify 
the pixels of regions; because the bands can be not only redundant, but 
a source of confusion, and decreasing so the accuracy of classification. 
Some methods use Mutual Information (MI) and threshold, to select 
relevant bands without treatement of redundancy; others consider the 
neighbors having sensibly the same MI with the GT as redundant and 
so discarded. This is the most inconvenient of this method, because this 
avoids the advantage of hyperspectral images: some precious informa- 
tion can be discarded. In this paper well make difference between useful 
and useless redundancy. A band contains useful redundancy if it con- 
tributes to decreasing error probability. According to this scheme, we 
introduce new algorithm using also mutual information, but it retains 
only the bands minimizing the error probability of classification. To 
control redundancy, we introduce a complementary threshold. So the 
good band candidate must contribute to decrease the last error proba- 
bility augmented by the threshold. This process is a wrapper strategy; 
it gets high performance of classification accuracy but it is expensive 
than filter strategy. 

Keywords: Hyperspectral images, classification, feature selection, mutual 
information, error probability, redundancy 
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1 Introduction 

In the feature classification domain, the choice of data affects widely the re- 
sults. For the Hyperspectral image, the bands dont all contain the information; 
some bands are irrelevant like those affected by various atmospheric effects, see 
Figure. 4, and decrease the classification accuracy. And there exist redundant 
bands to complicate the learning system and product incorrect prediction [14] . 
Even the bands contain enough information about the scene they may can't 
predict the classes correctly if the dimension of space images, see Figure. 3, is 
so large that needs many cases to detect the relationship between the bands 
and the scene (Hughes phenomenon) [10]. We can reduce the dimensionality 
of hyperspectral images by selecting only the relevant bands (feature selection 
or subset selection methodology), or extracting, from the original bands, new 
bands containing the maximal information about the classes, using any func- 
tions, logical or numerical (feature extraction methodology) [11] [9]. Here we 
focus on the feature selection using mutual information. Hyperspectral images 
have three advantages regarding the multispectral images [6], 

Assertion: when we reduce hyperspectral images dimensionality, any method 
used must save the precision and high discrimination of substances given by 
hyperspectral image. 
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Figure V. Precision an dicrimination added by hyperspectral images 

In this paper we use the Hyperspectral image AVIRIS 92AV3C (Airborne 
Visible Infrared Imaging Spectrometer). [2]. It contains 220 images taken on 
the region Indiana Pine at north-western Indiana, USA [1]. The 220 called 
bands are taken between 0.4m and 2.5m. Each band has 145 lines and 145 
columns. The ground truth map is also provided, but only 10366 pixels are 
labeled fro 1 to 16. Each label indicates one from 16 classes. The zeros indicate 
pixels how are not classified yet, Figure. 2. 
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The hyperspectral image AVIRIS 92AV3C contains numbers between 955 and 
9406. Each pixel of the ground truth map has a set of 220 numbers (mea- 
sures) along the hyperspectral image. This numbers (measures) represent the 
reflectance of the pixel in each band. So the pixel is shown as vector off 220 
components. Figure .3. 
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Figure 2: The Ground Truth map of AVIRIS 92AV3C and the 16 classes 

The hyperspectral image AVIRIS 92AV3C contains numbers between 955 
and 9406. Each pixel of the ground truth map has a set of 220 numbers 
(measures) along the hyperspectral image. This numbers (measures) represent 
the reflectance of the pixel in each band. So the pixel is shown as vector off 
220 components. 

Figure. 3 shows the vector pixels notion [7]. So reducing dimensionality means 
selecting only the dimensions caring a lot of information regarding the classes. 
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Figure 3: The notion of pixel vector 

We can also note that not all classes are carrier of information. In Figure. 
4, for example, we can show the effects of atmospheric affects on bands: 155, 
220 and other bands. This hyperspectral image presents the problematic of 
dimensionality reduction. 
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2 Mutual Information based feature selection 
2.1 Definition of mutual information 

This is a measure of exchanged information between tow ensembles of random 
variables A and B : 



J(AS) = E l °92P{A,B) 



P(A,B) 
p(A).p{B) 



Considering the ground truth map, and bands as ensembles of random vari- 
ables, we calculate their interdependence. 

Geo [3] uses also the average of bands 170 to 210, to product an estimated 
ground truth map, and use it instead of the real truth map. Their curves are 
similar. This is shown at Figure 4. 
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Figure 4: Mutual information of AVIRIS with the Ground Truth map (solid 
line) and with the ground apporoximated by averaging bands 170 to 210 
(dashed line) . 



2.2 The mesure of error probability 

Fano [14] has demonstrated that as soon as mutual information of already 
selected feature has high value, the error probability of classification is de- 
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creasing, according to the formula bellow: 



H{C/X) - 1 
Log 2 (N c ) 



< Pe< 



H{C/X) 
Log 2 



with : 



H(C/X) - 1 _ H(C) - I(C;X) - 1 
Log 2 (N c ) Log 2 (N c ) 



and : 




The expression of conditional entropy H( C/X) is calculated between the 
ground truth map (i.e. the classes C) and the subset of bands candidates X. 
Nc is the number of classes. So when the features X have a higher value of 
mutual information with the ground truth map, (is more near to the ground 
truth map), the error probability will be lower. But its difficult to compute 
this conjoint mutual information I(C;X), regarding the high dimensionality 
[14]. This propriety makes Mutual Information a good criterion to measure 
resemblance between too bands, like its exploited in section II. Furthermore, 
we will interest at case of one feature candidate X. 

Corollary: for one feature X, as X approaches the ground truth 
map, the interval P e is very small. 

3 The principe of proposed algorithm based 
on inequality of Fano 

Our idea is based on this observation: the band that has higher value of Mutual 
Information with the ground truth map can be a good approximation of it. So 
we note that the subset of selected bands are the good ones, if thy can generate 
an estimated reference map, sensibly equal the ground truth map. Its clearly 
thats an Incremental Wrapper-based Subset Selection (IWSS) approach[16] 



Our process of band selection will be as following: we order the bands 
according to value of its mutual information with the ground truth map. Then 
we initialize the selected bands ensemble with the band that has highest value 
of MI. At a point of process, we build an approximated reference map C est 
with the already selected bands, and we put it instead of X for computing the 
error probability (P e ); the latest band added (at those already selected) must 
make P e decreasing, if else it will be discarded from the ensemble retained. 
Then we introduce a complementary threshold J\ to control redundancy. So 
the band to be selected must make error probability less than ( P e - T^) , 
where P e is calculated before adding it. The algorithm following shows more 



[13]. 
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detail of the process: 

Let SS be the ensemble of bands already selected and S the band candidate to be 
selected. Build es u ma tedC{) is a procedure to construct the estimated reference 
map. P e is initialized with a value P* e . X the number of bands to be selected, 
SS is empty and R = 1..220. 



Algorithm 1 Proposed for Dimentionality Reduction and Redundancy control 

while \SS\ < X do 

Select band index s S=argmax s MI(s) 
SS <- SSU S and R\ S 

C ' est = Build estimatedC (S S) 

pe = HjC/Cest) H(C/C est )-l_ 
Log 2 Log 2 {N c ) 

if Pe < Pe* - Th then 

Pe <- Pe* 
else 

SS ^ ss\s 

end if 
end while 



4 Results and analysis 

We apply this algorithm on the hyperspectral image AVIRIS 92 AV3C [1] , 50% 
of the labeled pixels are randomly chosen and used in training; and the other 
50% are used for testing classifcation [3]. The classifer used is the SVM [5] 
[12] [4]. 

The procedure to construct the estimated reference map C est is the same SVM 
classifier used for classification. So C est is the output of classification. 

4.1 Results 

Table. I shows the results obtained for several thresholds. We can see the effec- 
tiveness selection bands of our algorithm, and the important effect of avoiding 
redundancy. 
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Table 1: Results illustrate elimination of Redundancy using algorithm based 
on inequality of Fano, for thresholds (Th) 



DLLILLlb 


n nn 

U.UU 


n nm 

U.UUl 


n nns 

U.UUO 


Th 

n m ^ 

U.UIO 


n n9n 
u.uzu 


u.uou 


10 


55.43 


55.43 


55.58 


53.09 


60.06 


71.62 


18 


59.09 


59.09 


64.41 


73.70 


82.62 


90.00 


20 


63.08 


63.08 


68.50 


76.15 


84.36 


- 


25 


66.02 


66.12 


74.62 


84.41 


89.06 


- 


27 


69.47 


69.47 


76.00 


86.73 


91.70 


- 


30 


73.54 


73.54 


79.04 


88.68 


- 


- 




76.06 


76.06 


81.38 


no o n 

92.36 






40 


78.96 


79.41 


86.48 








45 


80.58 


80.60 


89.09 








50 


81.63 


81.20 


91.14 








53 


82.27 


81.22 


92.67 








60 


86.13 


86.23 










70 


86.97 


87.55 










80 


89.11 


89.42 










90 


90.55 


90.92 










100 


92.50 


93.18 










102 


92.62 


93.34 










114 


93.34 













Figure. 5 shows more detail of the accuracy curves, versus number of bands 
retained, for several thresholds. This covers all behaviors of the algorithm. 

4.2 Analysis 

Tabled and Figure. 5 allow us to comment four cases: 

First: For the highest threshold values (0.1, 0.05, 0.03 and 0.02) we obtain 
a hard selection: a few more informative bands are selected; the accuracy of 
classification is 90% with less than 20 bands selected. 

Second: For the medium threshold values (0.015, 0.012, 0.010, 0.008, 0.006), 
some redundancy is allowed, in order to made increasing the classification ac- 
curacy. 

Tired: For the small threshold values (0.001 and 0), the redundancy allowed 
becomes useless, we have the same accuracy with more bands. 
Finally: for the negative thresholds, for example -0.01, we allow all bands 
to be selected, and we have no action of the algorithm. This corresponds at 
selection bay ordering bands on mutual information . The performance is low. 
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Figure 5: Accuracy of classification using the algorithm based on inequality of 
Fano, using numerous thresholds. 



We can not here that [15] uses two axioms to characterize feature selection. 
Sufficiency axiom: the subset selected feature must be able to reproduce the 
training simples without losing information. The necessity axiom "simplest 
among different alternatives is preferred for prediction" . In the algorithm pro- 
posed, reducing error probability between the truth map and the estimated 
minimize the information loosed for the samples training and also the predi- 
cate ones. 

We note also that we can use the number of features selected like condi- 
tion to stop the search; so we can obtain an hybrid approach filter-wrapper [16]. 

Partial conclusion: 

The algorithm proposed is a very good method to reduce dimensionality of 
hyperspectral images. 

We illustrate in Figure .6, the Ground Truth map originally displayed, like 
at Figure .1, and the scene classified with our method, for threshold as 0.03, 
so 18 bands selected. 
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Figure 6: Original Grand Truth map (in the left) and the map produced bay 
our algorithm according to the threshold 0.03 i.e 18 bands (in the right). Acu- 
racy=90%. 

Table II indicates the classification accuracy of each class, for several thresh- 
olds. 



Table 2: Accuracy of classification(%) of each class for numerous thresholds 
(Th) 



CI 


ass Total 








Th 








pixels 


0.00 


0.001 


0.008 


0.015 


0.020 


0.030 


1 


54 


86.96 


82.61 


86.96 


83.96 


78.26 


86.96 


2 


1434 


91.07 


89.40 


89.54 


89.12 


88.01 


83.96 


3 


834 


89.93 


90.89 


89.69 


86.09 


83.69 


81.53 


4 


234 


96.32 


83.76 


86.32 


87.18 


87.18 


86.32 


5 


597 


95.93 


95.53 


94.34 


95.93 


95.93 


95.53 


6 


747 


98.60 


98.60 


98.32 


98.60 


98.32 


98.32 


7 


26 


84.62 


84.62 


84.62 


84.62 


84.62 


84.62 


8 


489 


98.37 


98.37 


98.78 


97.96 


98.78 


98.78 


9 


20 


100 


100 


100 


100 


100 


100 


10 


: 968 


92.15 


92.98 


91.32 


91.74 


90.91 


89.05 


11 


: 2468 


93.84 


94.17 


92.54 


92.71 


91.90 


91.25 


12 


: 614 


91.21 


93.49 


92.83 


92.18 


88.93 


87.30 


13 


: 212 


98.06 


98.06 


98.06 


98.06 


98.06 


98.06 


14 


: 1294 


97.53 


97.86 


97.22 


97.84 


97.99 


97.53 


15 


: 390 


79.52 


77.71 


75.90 


74.10 


78.92 


64.46 


16 


: 95 


93.48 


93.48 


93.48 


93.48 


93.48 


93.48 
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Comments: 

First :we can not the effectiveness of this algorithm for particularly the classes 

with a few number of pixels, for example class number 9. 

Second: we can not that 18 bands (i.e. threshold 0.03) are sufficient to detect 

materials contained in the region. Its also shown in Figure .6 

Tired: one of important comment is that most of class accuracy change lately 

when the threshold changes between 0.03 and 0.015 



5 Conclusion 

In this paper we presented the necessity to reduce the number of bands, in 
classification of Hyperspectral images. Then we have introduce the mutual in- 
formation based scheme. We carried out their effectiveness to select bands able 
to classify the pixels of ground truth. We introduce an algorithm also based 
on mutual information and using a measure of error probability (inequality of 
Fano). To choice a band, it must contributes to reduce error probability. A 
complementary threshold is added to avoid redundancy. So each band retained 
has to contribute to reduce error probability by a step equal to threshold even 
if it caries a redundant information. We can tell that we conserve the useful re- 
dundancy by adjusting the complementary threshold. The process introduced 
is able to select the good bands to classification for also the classes that have a 
few number of pixels. This algorithm is a feature selection methodology. But 
its a wrapper approach, because we use the classifier to make the estimated 
reference map. This is expenssive than Filter strategy, but it can be used 
for application that need more precision. This scheme is very interesting to 
investigate and improve, considering its performance. 
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